8 research outputs found

    Analyzing crowd workers' learning behavior to obtain more reliable labels

    Get PDF
    Crowdsourcing is a popular means to obtain high-quality labels for datasets at moderate costs. These crowdsourced datasets are then used for training supervised or semisupervised predictors. This implies that the performance of the resulting predictors depends on the quality/reliability of the labels that crowd workers assigned – low reliability usually leads to poorly performing predictors. In practice, label reliability in crowdsourced datasets varies substantially depending on multiple factors such as the difficulty of the labeling task at hand, the characteristics and motivation of the participating crowd workers, or the difficulty of the documents to be labeled. Different approaches exist to mitigate the effects of the aforementioned factors, for example by identifying spammers based on their annotation times and removing their submitted labels. To complement existing approaches for improving label reliability in crowdsourcing, this thesis explores label reliability from two perspectives: first, how the label reliability of crowd workers develops over time during an actual labeling task, and second how it is affected by the difficulty of the documents to be labeled. We find that label reliability of crowd workers increases after they labeled a certain number of documents. Motivated by our finding that the label reliability for more difficult documents is lower, we propose a new crowdsourcing methodology to improve label reliability: given an unlabeled dataset to be crowdsourced, we first train a difficulty predictor v on a small seed set and the predictor then estimates the difficulty level in the remaining unlabeled documents. This procedure might be repeated multiple times until the performance of the difficulty predictor is sufficient. Ultimately, difficult documents are separated from the rest, so that only the latter documents are crowdsourced. Our experiments demonstrate the feasibility of this method

    Discovering the prerequisite relationships among instructional videos from subtitles

    Get PDF
    Nowadays, students prefer to complement their studies with online video materials. While there are many video e-learning resources available on the internet, video sharing platforms which provide these resources, such as YouTube, do not structure the presented materials in a prerequisite order. As a result, learners are not able to use the existing materials effectively since they do not know in which order they need to be studied. Our aim is to overcome this limitation of existing video sharing systems and improve the learning experience of their users by discovering prerequisite relationships among videos where basic materials are covered prior to more advanced ones. Experiments performed on commonly used gold standard datasets show the effectiveness of the proposed approach utilizing measures based on phrase similarity scores

    Predicting worker disagreement for more effective crowd labeling

    Get PDF
    Crowdsourcing is a popular mechanism used for labeling tasks to produce large corpora for training. However, producing a reliable crowd labeled training corpus is challenging and resource consuming. Research on crowdsourcing has shown that label quality is much affected by worker engagement and expertise. In this study, we postulate that label quality can also be affected by inherent ambiguity of the documents to be labeled. Such ambiguities are not known in advance, of course, but, once encountered by the workers, they lead to disagreement in the labeling – a disagreement that cannot be resolved by employing more workers. To deal with this problem, we propose a crowd labeling framework: we train a disagreement predictor on a small seed of documents, and then use this predictor to decide which documents of the complete corpus should be labeled and which should be checked for document-inherent ambiguities before assigning (and potentially wasting) worker effort on them. We report on the findings of the experiments we conducted on crowdsourcing a Twitter corpus for sentiment classification

    How to provide developers only with relevant information?

    Get PDF
    After the release of a new software version it is difficult for individual developers to keep track of all newly submitted bug reports complicating their decision making, e.g., which bug to resolve next? This problem is emphasized by the presence of further information sources, such as social media, which offer valuable user feedback to developers regarding the software. However, due to an abundant amount of information, developers might never notice this feedback. Hence, we envision a real-time system that provides developers with relevant information for improving the quality of their system while filtering out irrelevant facts from multiple information sources. For this system to work, it is necessary to compute the similarity between different types of documents, e.g., tweets and bug reports, in order to detect whether they are relevant to a developer or not. In this feasibility study, we focus on analyzing this core assumption in a simplified scenario in which we identify related bugs for a given software fix with the help of Natural Language Processing methods. In this experimental setting, which exhibits the key characteristics of our envisioned system, we obtain promising results indicating that our approach is feasible

    Exploration of video e-learning content with smartphones

    Get PDF
    Nowadays computer users prefer to learn or complement their studies with video materials. While there are many video e-learning resources available on the internet, video sharing platforms such as YouTube which provide these resources, do not structure the presented material in the prerequisite order. Furthermore, they do not track the background of the users when recommending the next material to watch. Our aim is to overcome this limitation of the existing video on demand systems. In this paper we describe the architecture of the e-learning system that we are developing which allows users to search and watch video materials organized with respect to their background and presented in prerequisite order. One of the key features of our e-learning platform is to enable users to explore the video content with mobile devices. We propose a new visual metaphor based on lists for mobile devices which reflect the prerequisite graph structure, utilizing the limited screen size more effectively

    SteM at SemEval-2016 task 4: applying active learning to improve sentiment classification

    Get PDF
    This paper describes our approach to the SemEval 2016 task 4, “Sentiment Analysis in Twitter”, where we participated in subtask A. Our system relies on AlchemyAPI and SentiWordNet to create 43 features based on which we select a feature subset as final representation. Active Learning then filters out noisy tweets from the provided training set, leaving a smaller set of only 900 tweets which we use for training a Multinomial Naive Bayes classifier to predict the labels of the test set with an F1 score of 0.478

    How do annotators label short texts? toward understanding the temporal dynamics of tweet labeling

    No full text
    Crowdsourcing is a popular means to obtain human-crafted information, for example labels of tweets, which can then be used in text mining tasks. Many studies investigate the quality of the labels submitted by human annotators, but there is less work on understanding how annotators label. It is quite natural to expect that annotators learn how to annotate and do so gradually, in the sense that they do not know in advance which of the tweets they will see are positive and which are negative, but rather figure out gradually what makes up the positive and the negative sentiment in a tweet. In this paper, we investigate this gradual process and its temporal dynamics. We show that annotators undergo two phases, a learning phase during which they build a conceptual model of the characteristics determining the sentiment of a tweet, and an exploitation phase during which they use their conceptual model, albeit learning and refinement of the model continues. As case study we investigate a hierarchical tweet labeling task, distinguishing first between relevant and irrelevant tweets, before classifying the relevant ones into factual and non-factual, and further splitting the non-factual ones into positive and negative. As indicator of learning we use the annotation time, i.e. the elapsed time for the inspection of a tweet before the labels across the hierarchy are assigned to it. We show that this annotation time drops as an annotator proceeds through the set of tweets she has to process. We report on our results on identifying the learning phase and its follow-up exploitation phase, and on the differences in annotator behavior during each phase

    Context-based extraction of concepts from unstructured textual documents

    No full text
    Summarizing a collection of unstructured textual documents, e.g., lecture slides or book chapters, by extracting the most relevant concepts helps learners realize connections among these concepts. However, to accomplish this goal existing methods neglect the context in which concepts are extracted - because a concept might be irrelevant in one context, but relevant in another one. To that end we propose a novel unsupervised method for extracting the relevant concepts from a collection of unstructured textual documents assuming that the documents are related to a certain topic. Our two-step method first identifies candidate concepts from the textual documents, then infers the context information for the input documents and finally ranks them with respect to the inferred context. In the second step this context information is enriched with more abstract information to improve the ranking process. In the experiments we demonstrate that our method outperforms seven supervised and unsupervised approaches on five datasets and is competitive on the other two. Furthermore, we release three new benchmark datasets that were created from books in the educational domain. Our code and datasets are available at: https://github.com/gulsaima/COBEC
    corecore